Personnel
Overall Objectives
Research Program
Application Domains
New Software and Platforms
New Results
Bilateral Contracts and Grants with Industry
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Section: New Results

Simplification and Run-time Resolution of Data Dependence Constraints for Loop Transformations

Participants : Diogo Nunes Sampaio, Alain Ketterlin [Inria CAMUS] , Louis-Noël Pouchet [CSU, USA] , Fabrice Rastello.

Loop optimizations such as tiling, thread-level parallelization or vectorization are essential transformations to improve performance. Their use rely on the ability to compute dependence information at compile-time to assess their validity, but in many real situations, dependence analysis fails to provide precise enough information. Typical examples where this happens are when working over compilers IR (e.g., LLVM IR) or with legacy source code, with pointers and linearized arrays (e.g., packed symmetric matrices in BLAS LAPACK). In this scenario, the compiler will often be unable to apply aggressive transformations due to lack of conclusive static dependence analysis.

This work makes a fundamental leap towards enabling complex loop transformations in real-life scenarios, by using a hybrid static+dynamic analysis to disambiguate may-dependencies. Similarly to GCC's auto-vectorization, our approach consists in adding a lightweight run-time test to check whether ambiguous may-dependencies do exist at execution time, to determine whether the optimized or unmodified code version should be called. The main contribution of our work is to generalize this pragmatic approach to a large class of loop-nest transformations, including tiling, loop invariant code motion, parallelization, etc. In particular, we design a quantifier elimination scheme on integer multivariate-polynomials, which can aid application of off-the-shelf polyhedral transformations on a larger class of programs, that holds polynomial memory access and affine loop bounds.

The preciseness of the presented scheme and the low run-time overhead of the test are key to make this approach realistic. We experimentally validate our technique on 25 benchmarks using complex loop transformations, achieving negligible overhead. Preciseness is assessed by the observed success of generated test in practical cases.

IPFME tool 5.5 has been developped in this context. This work is the fruit of the collaboration 8.4.1.1 with OSU. It has been presented at the ACM/SIGARCH International Conference on Supercomputing, ICS 2017 [25].